AITopics | policy search

Collaborating Authors

policy search

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Guided Policy Search via Approximate Mirror Descent

Neural Information Processing SystemsMay-1-2026, 06:07:20 GMT

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a "teacher" algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy search methods provide asymptotic local convergence guarantees by construction, but it is not clear how much the policy improves within a small, finite number of iterations. We show that guided policy search algorithms can be interpreted as an approximate variant of mirror descent, where the projection onto the constraint manifold is not exact. We derive a new guided policy search algorithm that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and show that in the more general nonlinear setting, the error in the projection step can be bounded. We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.

local policy, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

661b1e76b95cc50a7a11a85619a67d95-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 17:06:56 GMT

international conference, trajectory, world model, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(16 more...)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

661b1e76b95cc50a7a11a85619a67d95-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 17:06:49 GMT

international conference, trajectory, world model, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
(16 more...)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Near Optimal Policy Optimizationvia REPS

Neural Information Processing SystemsFeb-7-2026, 09:43:31 GMT

Lemma 7.Let Assumptions 1, 2 and 3 hold.

artificial intelligence, arxivpreprintarxiv, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

Optimizing Energy Production Using Policy Search and Predictive State Representations

Neural Information Processing SystemsDec-27-2025, 15:29:03 GMT

We consider the challenging practical problem of optimizing the power production of a complex of hydroelectric power plants, which involves control over three continuous action variables, uncertainty in the amount of water inflows and a variety of constraints that need to be satisfied. We propose a policy-search-based approach coupled with predictive modelling to address this problem. This approach has some key advantages compared to other alternatives, such as dynamic programming: the policy representation and search algorithm can conveniently incorporate domain knowledge; the resulting policies are easy to interpret, and the algorithm is naturally parallelizable. Our algorithm obtains a policy which outperforms the solution found by dynamic programming both quantitatively and qualitatively.

optimizing energy production, policy search, search and predictive state representation, (3 more...)

Neural Information Processing Systems

Industry: Energy > Power Industry (0.62)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)

Add feedback

Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs

Zhou, Yifan, Grover, Sachin, Mistiri, Mohamed El, Kalirathnam, Kamalesh, Kerhalkar, Pratyush, Mishra, Swaroop, Kumar, Neelesh, Gaurav, Sanket, Aran, Oya, Amor, Heni Ben

arXiv.org Artificial IntelligenceDec-1-2025

Reinforcement Learning (RL) traditionally relies on scalar reward signals, limiting its ability to leverage the rich semantic knowledge often available in real-world tasks. In contrast, humans learn efficiently by combining numerical feedback with language, prior knowledge, and common sense. We introduce Prompted Policy Search (ProPS), a novel RL method that unifies numerical and linguistic reasoning within a single framework. Unlike prior work that augment existing RL components with language, ProPS places a large language model (LLM) at the center of the policy optimization loop-directly proposing policy updates based on both reward feedback and natural language input. We show that LLMs can perform numerical optimization in-context, and that incorporating semantic signals, such as goals, domain knowledge, and strategy hints can lead to more informed exploration and sample-efficient learning. ProPS is evaluated across fifteen Gymnasium tasks, spanning classic control, Atari games, and MuJoCo environments, and compared to seven widely-adopted RL algorithms (e.g., PPO, SAC, TRPO). It outperforms all baselines on eight out of fifteen tasks and demonstrates substantial gains when provided with domain knowledge. These results highlight the potential of unifying semantics and numerics for transparent, generalizable, and human-aligned RL.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2511.21928

Country: North America (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Sports (0.93)
Leisure & Entertainment > Games > Computer Games (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Dual Policy Iteration

Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Bagnell

Neural Information Processing SystemsNov-20-2025, 14:38:08 GMT

We also provide a general convergence analysis to support our empirical findings. Although our analysis is similar to CPI's, it has a key difference: as long as MBOC succeeds, we can provide a larger policy improvement than CPI at each iteration.

artificial intelligence, machine learning, reactive policy, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New Jersey (0.04)
North America > Canada > Quebec > Montreal (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

20563b8508ba42e1b688d922e926ee26-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 20:39:28 GMT

algorithm, fedavp, gradient, (15 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Virginia (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(4 more...)

Add feedback

Local policy search with Bayesian optimization Sarah Müller

Neural Information Processing SystemsOct-9-2025, 16:09:35 GMT

Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of sample complexity.

artificial intelligence, machine learning, optimization, (15 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California (0.04)
(2 more...)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Filters

Collaborating Authors

policy search

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Guided Policy Search via Approximate Mirror Descent

661b1e76b95cc50a7a11a85619a67d95-Supplemental.pdf

661b1e76b95cc50a7a11a85619a67d95-Paper.pdf

Near Optimal Policy Optimizationvia REPS

Optimizing Energy Production Using Policy Search and Predictive State Representations

Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs

e0ab531ec312161511493b002f9be2ee-Paper.pdf

Dual Policy Iteration

20563b8508ba42e1b688d922e926ee26-Paper-Conference.pdf

Local policy search with Bayesian optimization Sarah Müller